Typical support and Sanov large deviations of 
correlated states 

Igor Bjelakovic 2,3 Jean-Dominique Deuschel 2 
Tyll Kriiger 1,2 ' 4 Ruedi Seiler 2 

Rainer Siegmund-Schultze 1 ' 2 
Arleta Szkola 1,2 

^Max Planck Institute for Mathematics in the Sciences 

Inselstrasse 22, 04103 Leipzig, Germany 
2 

Technische Universitat Berlin 

Fakultat II - Mathematik und Naturwissenschaften 

Institut fur Mathematik MA 7-2 

Strafie des 17. Juni 136, 10623 Berlin, Germany 
3 

Hcinrich-Hcrtz-Chair for Mobile Communication 

Technische Universitat Berlin 

Werner-von-Siemens-Bau (HFT 6) 

Einsteinufer 25, 10587 Berlin, Germany 
4 

Universitat Bielefeld 
Fakultat fur Physik 
Universitatsstr. 25, 33619 Bielefeld, Germany 

February 2, 2008 



Abstract 

Discrete stationary classical processes as well as quantum lattice states 
are asymptotically confined to their respective typical support, the expo- 
nential growth rate of which is given by the (maximal ergodic) entropy. 
In the iid case the distinguishability of typical supports can be asymp- 
totically specified by means of the relative entropy, according to Sanov's 
theorem. We give an extension to the correlated case, referring to the 
newly introduced class of HP-states. 



1 Introduction 

A relevant notion on the interface of classical discrete probability theory and 
information theory is that of typical subsets. For the quantum extensions of 
these fields there is a corresponding notion: typical subspaces. 
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The general picture is that a stationary process (state in the case of quantum 
lattice systems) is asymptotically -i.e. observing a large finite interval- more and 
more confined to its typical support. The size of this support has an exponential 
growth rate (possibly zero) given by the essential supremum of the entropies of 
the ergodic components. In the classical situation this is the content of the 
Shannon-McMillan theorem. It clarifies the importance of Shannon entropy for 
several fields, from data transmission and compression to statistical mechanics 
or complexity theory. 

Under the much stronger condition of complete independence Sanov's the- 
orem (see [21] or [7]) specifies the exponential rate of this confinement of a 
classical iid process to its own typical set, or equivalently, the rate of avoidance 
of the supports of all other iid processes' typical sets. This large deviations 
result is usually seen as a result on empirical distributions, as in its formula- 
tion a particular instance of typical set appears: typical for an iid process are 
realizations with an empirical distribution close to the probability distribution 
underlying this very process, see ch. 3.2 in Deuschel and Stroock [S]. 

In the iid case Sanov's theorem significantly extends the assertion of the 
Shannon-McMillan theorem. In fact, taking the equidistribution as reference 
measure, it follows from Sanov's theorem that there is a universal typical set 
sequence of approximate size e nh for all iid processes with (base e) entropy less 
than h. It is well-known in the classical situation that this extends to the general 
ergodic case (since there exist universal compression schemes like the Lempel- 
Ziv algorithm) . This universality result was generalized to the quantum case by 
Kaltchenko and Yang |14j . using a nice 'rotation technique' and the quantum 
Shannon-McMillan theorem pQ. 

From the point of view of statistical hypothesis testing Sanov's theorem as- 
serts that there is a universally typical set sequence for any set of iid probability 
distributions (null hypothesis), separating it optimally from any other set of iid 
processes (alternative hypothesis) at a rate arbitrarily close to the infimum of 
the relative entropies between probability measures from the two hypotheses. 
So in the classical case Sanov's theorem expresses a twofold universality in the 
choice of the typical sets. 

The special case of Sanov's theorem with both hypotheses consisting of only 
one probability distribution each, is usually called Stein's lemma. 

As already emphasized in [2] , when passing from the classical to the quantum 
case, the universality mentioned above gets partially lost: there exists no longer 
a sequence of typical subspaces (of the underlying finite dimensional Hilbert 
spaces for the n-blocks of the system), which would work universally, what- 
ever the reference states are. Consequently, speaking in the hypothesis testing 
terminology, for the alternative hypothesis only one process/state is admitted 
here. Universality with respect to the null hypothesis states is maintained, how- 
ever. Also, in the quantum situation it is no longer possible to originate Sanov's 
theorem on the concept of empirical distributions (states), see [2J, chapter 4. 

We mention here that the main techniques needed to generalize Sanov's 
theorem to the iid quantum case were already presented in Hayashi [10] (1997), 
and in Hayashi [TT] (2002) an equivalent result is shown. The authors of the 
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present paper regrettably were not aware of this part of Hayashi's work during 
the preparation of [2]. 

It is the aim of this paper as a continuation of [5] to extend the assertion of 
Sanov's theorem in several directions. This concerns the classical case, too, but 
the main focus is on the quantum situation. 

First, the restriction to the uncorrelated case is substantially alleviated: No 
condition besides stationarity is imposed to the processes/states P of the null 
hypothesis. As for the alternative hypothesis (reference measure/state) Q, even 
stationarity is not assumed. The only requirements are the existence of relative 
entropy rates h(W, Q) < +00 for the ergodic components W occurring in the 
null hypothesis set, and the validity of the upper bound {achiev ability part) in 
Stein's lemma concerning W and Q (see Theorems [Til resp. [TBI in the classical 
situation). These are, in a sense, minimal requirements, since Stein's lemma is a 
trivial consequence of Sanov's theorem obtained by forgetting about universality. 

As an application of this general result we consider the case that a certain 
(admittedly very strong) mixing condition holds for the reference process Q. 
Observe that the very existence of the relative entropy rate for correlated pro- 
cesses can only be guaranteed in terms of mixing conditions, if the reference 
process is particulary strong mixing. Shields 24J gives an example where the 
reference process is even maximally mixing in the sense of Dynamical Systems 
theory (U-process, i.e. isomorphic to an iid process), but nonetheless there ex- 
ists no asymptotic rate of the relative entropy. Though the mixing condition 
upon the reference processes is very strong (^-mixing, cf. [4], or [5], where it is 
called ^-mixing), the class of aperiodic irreducible Markov processes on a finite 
state space is covered. In this Markov case aperiodicity is necessary and suffi- 
cient for mixing, but not needed for Sanov's theorem, showing that *-mixing is 
far from being a necessary condition for a Sanov type theorem. In fact, in the 
classical case a kind of average- mixing would yield the result, cf. condition (U) 
on page 86 of [9]. 

We also mention, that in the classical case a usual mixing condition to derive 
large deviation results is hypermixing, cf. chapter 5.4 in [3]. 

Secondly, we generalize the classical Sanov's theorem to the (correlated) 
quantum situation (in Hayashi [llj and later in [5] the quantum iid case was 
considered). In fact, since the classical assertion is a special case of the quantum 
theorem, we only prove the latter. Again, the reference state only needs to fulfil 
the two minimal conditions mentioned above. We refer to those as HP-condition. 
The states forming the null hypothesis have to be stationary only. 

It would be interesting to specify the set of all states which fulfil the HP- 
condition with respect to any ergodic (null hypothesis) state. We call these 
states HP-states. As already said, this set comprises all *-mixing states, but 
can be expected to be much larger. 

In the classical situation we remind the reader of an interesting example by 
Xu [25) : There exists a B-process (i.e. again maximally mixing in the sense of 
Dynamical Systems) Q which has the property that the relative entropy rate 
h{P, Q) exists and is zero for any stationary process P. So this process cannot 
be separated at exponential speed from an arbitrary other stationary process. 
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It would be interesting to find conditions weaker than *-mixing ensuring 
exponential separability in the case that the relative entropy rate is positive. 

In the presented form, the quantum Sanov's theorem comprises and extends 
several earlier results on typical subspaces and their connection with the von 
Neumann entropy and relative entropy. 

In particular, the result [2] of the present authors (which was preceeded 
by Hayashi [11]) is extended to the correlated case. The quantum Shannon- 
McMillan theorem p] is covered and extended from the ergodic to the general 
stationary situation. The universality result of Kaltchenko and Yang [TJ] is 
covered, too, by using the tracial state as reference state in Theorem [TTJ In 
fact this Kaltchenko- Yang universality is a main ingredient in our proof. The 
quantum Stein's lemma (see [19], [20], chapter 1.1 for the iid case, [3] for the 
case of ergodic null hypothesis states) is covered and extended to the case of 
correlated reference states. Results of Hiai and Petz [12], [13] are completed 
in the sense that their bound is shown to be sharp, which means that it is 
asymptotically optimal, and the condition of complete ergodicity concerning 
the null hypothesis is dropped. In particular, the case of irreducible aperiodic 
algebraic (reference) states on a quasi-local algebra over a finite-dimensional 
C*-algebra A (also called finitely correlated states) considered in [T3] is covered 
by *-mixing. We mention, that Hiai and Petz emphasize in [13j that they derive 
almost all assertions using *-mixing, only. 

As already emphasized, the quantum Sanov' theorem is a special type of 
quantum large deviations result. We refer the reader to some other work in 
this direction, see Lebowitz, Lenci and Spohn [15], Lenci and Rey-Bellet [16], 
Netocny and Redig [15] . De Roeck, Maes and Netocny [8j. 

We give a short account of the principal steps to prove the main result. 

In chapter [3] we show that 'one half of Stein's lemma, namely the assumed 
achievability of the relative entropy rate as separation rate, already implies 
Sanov's theorem. 

First it is shown that the optimality of the relative entropy rate (as sep- 
aration rate) is a consequence of its achievability. In fact, for two states $ 
such that s(\I/, i>) and s(^f) exist, the quantity — n(s( v I', is the asymp- 

totic average of the logarithmic eigenvalues of D^n) , which denotes the density 
operator of the local state on the disrete interval of length n, with re- 
spect to the probability measure generated on the corresponding eigenvectors 
by the operator -D^m . On the other hand, the achievability part of Stein's 
lemma implies that — n(s(\I',$) + s(^)) is also an essential upper bound for 
these logarithmic eigenvalues. The key tool to show the latter is Lemma [8] 
Now, roughly speaking, with the asymptotic average being the asymptotic up- 
per bound, it must be an asymptotic lower bound, too. This observation yields 
a relative AEP (asymptotic equipartition property) for the logarithmic eigen- 
values of D^( n ) : the vast majority of them (with respect to the considered 
probability distribution) is close to — n(s(vp, $) + s(^)). Because for ergodic ^ 
by the quantum Shannon-McMillan theorem the relevant dimension of the cor- 
responding subspace of eigenvectors of is close to e ns ^\ it easily follows 
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now (applying Lemma [5] once again) that the optimally separating subspaces 
can essentially be described as those which are close to the span of the men- 
tioned eigenvectors of -D$(n) fulfilling the relative AEP: the ^"'-expectation is 
close to e ns (*) • e -«(s(*,*)+s«) = e -™(*.*). 

Next we make use of the proven relative AEP, combined with Kaltchenko 
and Yangs universality result to show Sanov's theorem: We subdivide the null 
hypothesis set into small slices of almost constant value of the 'mixed' term (aka 
cross-entropy) s m i X := s(^', < i>) + s(W) = — lim —TrD^n) log and within 

these slices the entropy rate is bounded from above by s m i x — inf* s(\I/, $). Then, 
by Kaltchenko- Yang universality, there exists a common support of dimension 
pa e nOmi X -mf,i, s(*,*)) which D y the relative AEP can be chosen to consist of 
eigenvectors of with eigenvalues close to e _nSmix . So this common sup- 

port has an asymptotic "^"'-expectation close to e n ' Smbc_inf * s (*>*)) • e _,lSmix = 

g — n inf* s(>3>,<I>) 

This essentially proves Sanov's theorem under the HP-condition. 

In chapter 2] we prove that ^-mixing implies the HP-condition, hence the 
quantum Sanov's theorem. The idea is borrowed from [13] : under ^-mixing, 
the reference state $ is sufficiently close to some block-iid state, so that we may 
apply the techniques developed in p~2] and [5] in order to prove the achievability 
part of Stein's lemma. 

In chapter [5] we use the ergodic decomposition of stationary states to ex- 
tend our results to the case where the null hypothesis states are only assumed 
stationary. 

2 Basic settings and notations 

As announced in the introduction, we address both the classical and the quan- 
tum situation. Let us first consider the classical case. Let a finite set A of 
symbols be given. We deal with processes P on [A z ,2t z ], where 2l z denotes the 
er-ficld over A 1, which is generated by finite dimensional cylinders. We denote 
the set of all processes by 7 3 (A Z ). Let p( n > denote the marginal of a process P, 
restricted to the positive (time) indices {0, 1, n — 1} C Z. 

The relative entropy rate between two processes P, Q is defined as 

h(P,Q):= lim -H{P {n \Q^) 

n — >oo Ti 

whenever this limit exists in M. + = M + U{+oo}. Here H(-,-) denotes the relative 
entropy of two probability measures given on a finite set. 

If Q e V(A Z ), Q. C P(A Z ) and h(P, Q) exists for each P e O we write 

h(n,Q) forinfpgnM^.O)- 

The following very strong mixing property of Q was introduced by Blum, 
Hanson and Koopmans ; U (referred to as ^-mixing in the survey paper [5]), 
which implies the existence of the relative entropy rate h(P, Q) for any stationary 
P (see [13], where the more general quantum case is treated): 
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Definition 1 A stationary process Q on L4 z ,2l z ] will be called ^-mixing if for 
each < a < 1 there exists an I £ N such that 

aQ(B)Q(C) < Q(B n C) < a- 1 Q{B)Q(C) (1) 
whenever B € 2t{-.-2,-i,0} ) C e _ 

Here 2l T , T C Z, denotes the sub-<7-field of 2l z concerning only times t £ T. 

Observe that irreducible and aperiodic (i.e. weakly mixing) Markov chains 
are automatically *-mixing, even with a = a(l) tending to 1 exponentially 
fast as I — > oo. In the general situation, even strong mixing (a-mixing in the 
terminology of [5]) does not imply ^-mixing, because rare events may still deviate 
much from independence. 

We note that in the following we use the seemingly weaker condition, that 
|T]) is fulfilled for some a > and some I. (The same was emphasized for most 
of the results in [13].) But, in fact, in the stationary classical situation this is 
already equivalent to full *-mixing, see [5], Theorem 4.1. 

Let 'P s tat(^4 Z )) 'Perg(^4 Z ) resp. P lf (A z ') denote the set of stationary, of ergodic 
resp. stationary ^-mixing processes with state space A. 

We briefly introduce now the corresponding quantum set-up. 

Consider a finite-dimensional C*-algebra A. The classical case is covered 
choosing A is abelian. 

It is well-known, that A can always be represented as a finite direct sum of 
matrix algebras 

m 

A = ®M kt , (2) 
i=i 

where Aik is the algebra of complex k x k matrices. The abelian case is covered 
if all ki are 1, meaning that A is simply the commutative algebra of complex 
functions over a finite set A = {1,2,..., m}. A state ip on A is a positive func- 
tional on A with the property — 1, where 1 is the unity. The set of all 
states on A is denoted by S(A). This is the set of probability measures on A in 
the abelian case. 

Any state ip on the finite-dimensional algebra A is uniquely given by its 
density operator £ A, which is a positive trace-one operator fulfilling 

V>(A) = Ti A {D^X) for each X G A. 

Here Tr^ denotes the canonical trace in A which is nothing but the sum of the 
matrix traces in the above representation |[2| . 

The quantum generalization of a stochastic process is usually constructed as 
follows (and in correspondence to the definition of a process by its compatible 
finite-dimensional distributions via Kolmogorov's extension theorem): For each 
finite subset T c Z consider the C*-algebra A T := ®teT-^- Then for any 
T C T' C Z there is a canonical embedding of A T into A T as a C*-subalgebra. 
With respect to this identification consider the algebra 

A:= |J A T = \JA "i. 

TCZ n6N 
T finite 
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A is not norm-complete. We denote the completion by A z . It is a C*-algebra 
and is called the quasilocal algebra constructed from A. Again, a state 'J on _4 Z 
is a positive functional on A z with the property = 1. If A is abelian, there 
is a one-to-one correspondence between states on A z and stochastic processes 
with alphabet A: The restrictions ^( T ) := \ a t of ^ to the local algebras A T 
correspond to the marginals P' T ) of the stochastic process P on the cylinder a- 
algebras 2l T . This comes from the fact that any compatible family of local states 
'jA- r ) has a unique extension to A z just as any compatible family of marginals 
can be extended to a stochastic process. 

There is a canonically defined shift operator r on A z (mapping in particular 
A {0} C A z onto A {1} C A z ). The set of stationary states S stat (A z ) is the 
subset of states in S(A 1 ') which are invariant with respect to r. This is a 
Choquet simplex, the extremal points are called ergodic states S eTg (A' 1 ). The 
notions coincide with the classical ones in the abelian case. 

We complete the picture by defining a mixing property (cf. 13J) as above: 

Definition 2 A stationary state <I> in iS s tat(-4 Z ) will be called ^-mixing if for 
each < a < 1 there exists an I £ N such that for each k G N 

Q,$({-fe -fe+i-,o» ^ $({l,l+i,...,l+k}) 

< <j)({-fc,-fc+i---,0}u{M+l,...,i+fc}) 

< a _1 $ ({_fe - fe + 1 -.°}) ® $({M+l,-,*+fc}) 



We denote the set of stationary ^-mixing states by S*(A Z ). 
Next we introduce the quantum version of the relative entropy rate. Let 
tj), if € 5(»4). The relative entropy is defined as 

£/0 \ ._ f Tr^D. (logD0 - logZ^), if supp(V') < supp(</j) 
[ oo, otherwise. 

Here supp(D) is the smallest projection p G A fulfilling pDp = D (with D G A 
self-adjoint). 

Now, for \&, $ e S(A Z ), we define the relative entropy rate 
a(*,$) := lim -S*(* (n) , $ (n) ), 

n— >oo 71 

whenever this limit exists in R + (we write for short instead of ij/t'f ' 1 '--'™ -1 }) 
and .AW instead of ^O' 1 — "- 1 >). 

Again, if <f> G S{A Z ), C S(A. Z ) and s(*, $) exists for each * G we write 
s(fi, $) for inf* en s(*,$)- 
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3 Equivalence of Sanov's theorem and Stein's 
lemma 

The maximally separating exponents for two states $ on A z are denned by 

/3 e ,„(*, $) := min{log$(q) : q G A {n) projection, > 1 - s}, 

for e G (0,1)- By /3 £ (*,$) we denote limsup n ^ 00 i/3 £) „(vl', $), and if the limit 
exists in — R + := — [0,oo], we denote it by (3 E ( 1 t>,$>). 

Definition 3 We say that the pair ('J', $) satisfies the HP-condition if the rel- 
ative entropy rate s(^, <&) exists and /3 e ( , 5, <£>) < —s(^f, <&) /or aZZ e G (0, 1). 

This condition was first proved to be fulfilled by Hiai and Petz in [T^] for 
the special case that ^> is completely ergodic and $ is a stationary product state 
(i.e. an iid state) and later in [13] for completely ergodic W and ^-mixing states 
$. 

Definition 4 We say that $ G S(A Z ) is a HP-state if, for any ergodic state 
9 G iS ers („4 z ), the pair (if?,<f>) satisfies the HP-condition. 

As it turns out, the statement in Sanov's theorem is equivalent to the HP- 
condition: 

Theorem 5 Let $ be a state on A z and C S erg (A 1 '). Then following state- 
ments are equivalent: 

1. For each \t£8 the pair ('J', $) satisfies the HP-condition. 

2. The quantity s(^, $) < +oo exists for each G 0, and to each subset 
Q C 6 and any rj > there exists a sequence {p n }n&fi of projections 
p n G A {n) with 

lim ^ {n} (p n ) = 1, for all * G ft (3) 

n — >oo 

smc/i i/ia£ if s(Q, $) < oo 

limsup- log (p n ) < -s(Q, $) + 77, (4) 

n — >oo 71 

limsup- log <& (n) (p„) < . (5) 

n — >oc 71 77 

Moreover, for each sequence of projections {p n } fulfilling we have 
liminf-log$ (n) (p„) > -s(fi,$). 

n — >oo 71 

Hence — s(Q, $) is f/ie lower limit of all achievable separation exponents. 
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Remark 6 1. There are examples showing that in general one cannot choose 
?y = 0, meaning that the exact value — s(fl, is not necessarily achievable. 

2. If $ is stationary and, moreover, *-mixing, statement 1. of the Theorem 
is fulfilled with Q — S erg (A z ). This will be seen in section^ 

The implication 2. =>• 1. is trivial. The proof of the converse implication is 
carried out in subsection 13.21 

As an immediate consequence, we have the following assertion for the clas- 
sical case: 

Let the maximally separating exponents for two processes P,Q E V(A Z ) be 
defined by 

e>n (P, Q) := min{ log Q(">(M) : M C A n , pW(M) > 1 - e}, 

for e S (0,1). By f3 e (P,Q) we denote limsup n ^ 00 i/3 ein (P, Q), and if the limit 
exists in — R + , we denote it by (3 S (P, Q). 

Theorem 7 Let Q e P(^4 Z ),9 C 7 , erg (A z ) awd suppose that the relative en- 
tropy rate h(P,Q) exists for all P G 8. Then following statements are equiva- 
lent: 

1. f3 £ {P, Q) < -h(P, Q) for all P e 6 and all e E (0, 1). 

2. For each set C 6 each r\ > there is a sequence of subsets {M n }, 
M n C A n , such that 

lim P(M n ) = 1, for all P E fl, (6) 

n — >oo 

and 

limsup- logQ (Tl) (M„) < -h(n, Q) + n 

n — >oo ri 

if h(Q, Q) < oo, otherwise if h(fl, Q) = oo 

limsup- log Q (n) (M n ) <--. 

n — >oo ri 7] 

Moreover, for each sequence of subsets {M n } fulfilling we have 
liminf- logQ (n) (M n ) > -h(Sl,Q). 

n — >oo n 

Hence —h(£l, Q) is the lower limit of all achievable separation exponents. 

3.1 A quantum relative AEP and achievability in Stein's 
lemma 

We start with a useful lemma which allows to translate some standard techniques 
and estimates used in classical information theory into the quantum setting. 
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Lemma 8 Let p 1 q be arbitrary projections and t be a state on A. Suppose that 
u is a projection commuting with D T . Then we have 

r(qpq) > r(qpqu) > r(p) - 2 (r(l - q)) - r(l - u). (7) 
Let c > 0. If D T u < cu then 

Mpq) > ~ Up) - 2 (r(l - q)) 1 ' 2 - r(l - uj) . (8) 



Proof. The first inequality in is trivial. The second follows applying the 
Cauchy-Schwarz inequality for the Hilbert-Schmidt inner product: 

r(p) = r(pq) + r(p(l - q)) 

< \r{pq)\ + (Tr(D T (l-q)))i 

< r(qpq) + |r((l - q)pq)\ + (r(l - q)f /2 

< T{qpq) + 2(T(l-q)) 1/2 

1 /2 

= r(qpqu) + r(qpq(l -u))+2 (r(l - q)) 

< T(qpqu)+T{l-u) + 2{T(l-q)) 1/2 . 

In the last inequality the assumption [u, D T ] = has been used. Finally observe 
that u > -D T u and 

Tr(pq) = Tr(qpo) = Tr(qpou) + Tr(qpq(l — u)) > -Tr(qpqD T u). (9) 

Inequality ([5]) follows immediately inserting Q into eqn. ©. ■ 
For < s < oo, write w^ ( „, (s) for the finite direct sum 



«*(«)(«):= E B P ec e _„./ 



s — e<s' <s+e 



where spec,\(-) denotes the eigen-projection of its argument's density operator 
(here corresponding to the eigenvalue A. We extend this definition to 

the case s = oo by setting 

«| (n) (oo):= B peco(#W)+ £ spec e _ na , (10) 

£- 1 <S / 

Now we have 

Proposition 9 Let ^ 6e ergodic and let $ 6e an arbitrary state on A 1 " . If the 
pair ('J, $) fulfills the HP-condition then: 

• f3 e (*if, $) = limn^oo ^/9 S) n(^> $) exists and we have 
/or eac/i e € (0, 1). 
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• Moreover, for all e > it holds that 

lim * (n) (u| )( „ ) (s(*) + s(*,$))) = 1 (relative AEP) 

n — >oo 

and for each sequence {p n } of projections fulfilling 

V {n) (Pn) -> 1 and -logTrp„ -» (11) 

n^oo 77, n^oo 

£/iere is a sequence e n \ smc/i </iai imi/i u„ := (s(^) + s( v E', $)),. iAe 
relations 

* (n) (supp(M„p„M„)) -> 1 

n — >oo 

and 

— log <j>( n ) (supp(u„p n u„)) — > — s(^, $) (max. separating projection) 

Tl n— >oo 

are fulfilled. 

Remark. The quantum Shannon-McMillan theorem pQ guarantees the ex- 
istence of a sequence of projections {p n } with the properties assumed in (fTT|) . 
We refer to such sequences as entropy- typical w.r.t. 'J. Roughly speaking, the 
above proposition shows that one obtains a sequence of maximally seperating 
projections as an 'intersection' of "J-entropy-typical projections with appropri- 
ate eigen-projections of the reference state $. 

Proof. 1. First assume s(^/, $) < +oo. By the monotonicity of the relative 
entropy we may conclude that S(^^ n \ < +00 for each n. We have 

is^Cn) ,$(«)) = - -TtD 9W logZ»» ( «j. 

n n n 

Let {Ajj^lj be the set of non-zero eigenvalues of -D$m. We get 

,$(")) = --S(¥ n h - - VlogA l Tr^ >]/( „ ) spec A . ($< n >). 
71 n n ^ 

Let e > and 

?w:= E spec Ai ($<")). 

A. > e -™OC*)+s<:*,*)-e) 

We claim that 

lim y(p„ s ) = for all e > 0. 
In fact, suppose on the contrary that for some e > we have 

limsup'I'(p„ i e) > 0. 

n — >oo 

We conclude the existence of some 7 > and some subsequence {rij} with 

* (n,) (Pn, ie ) >7>0. 



11 



Fix some a e (0,1), S > 0. Let p nj := p nj , e , q nj := argmin/?^. $) and 
u nj := u* (nj) (s(*)). Then £>,<„,) u,,, < cu n , for c = e -»vW*)-«) and by 
Lemma [5] and the quantum Shannon-McMillan theorem we arrive at 

^^Hqn.Pn^n.U^) > J - - 6 > 0, 

and 

Tr(q njPn .q nj ) > e »i(«(*)-*)( 7 - 2^ - 6), (12) 

if j is large enough and if lyfa. + <5 < 7. Now, observe that D^ nj ) and p Mj 
commute and that consequently we have D^ nj ) > e _ " J '' ^5 '*- )+s ' ,I ''*)~ e ) 'p n . by 
definition of p nj . . Thus we obtain 

After applying trace to both sides of this inequality, taking logarithms, dividing 
by rij , taking limit superior and using ()12[) we are led to 

$) > -»(*, *) + £ - S > -«(*, $), 

which contradicts the assumed HP-condition provided that <5 < e. 

In the case s(W, $) = 00 everything can be done in the same way, we just have 

to substitute the definition of p n .e by 

A i >e-"/ e 

and obtain (3 a > — - + s(^) — S, again in contradiction to $) = —00, 

hence again the projectors have asymptotically vanishing expectation with 
respect to "J for each positive e. 

2. Let first s(*,$) < 00. We have $ (ti) ) -> s(*,$) as n ->■ 00 by 

assumption, hence 

--TrD 9W logD 9W — > s(tf) + «(*,$) 

77, n — >oo 

and the mixed term — ^-TrD^ („) log _D 4 o) is the expectation value of the random 
variable — ^ log Xi with respect to the probability measure given by 

{Tr£> w( „ )S pec Ai ($W)}, 

where again {Xi} runs through the non-zero eigenvalues of $("). On the other 
hand, we have shown in 1 . that the lower bounded random variable — — log Xi > 
is bounded asymptotically in probability by the quantity s(\E , ) + s(\l/, 3>), being 
its asymptotic expectation value at the same time, i.e linin^oo ty(p ni£ ) = for 
all e > 0. From this it easily follows that 

lim &(t nS ) = for all 6 > 0, 

n—xyo 
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where 

tn,s:= spec Ai ($W). 

Ai<e-"0(*)+s(*.*)+<5) 

This is the assertion 

lim *W (u= (n) («(*) + $))) = 1 (14) 

n — 'oo 

for all e > 0. In the case s(^, $) = oo the relative AEP follows immediately 
from 1. 

3. First assume s^, $) < oo. Fix some e and some a £ (0, 1). Let {q n } be 
any sequence of projections fulfilling ^f^ n \q n ) > 1 - a for n large enough. Let 
Pn ■= ui ( „)(s(^) +s(*,$)). We proved that <J> ( ™)(p„) -> 1. Now as in 1. we 

n— >oo 

may conclude 

$ (n) (g„) > e-"( s ( <p )+ s (*'*)+ e )Trp n9n . 
Using the quantum Shannon-McMillan theorem and again Lemma [8J this time 
applied to the density operator of Vf^™) and with u := XA<e-™< s <*)- 5 > s P ec A(*^)> 
for arbitrary 6 > and n large enough we have Trp n q n > e n ( s ^~ s ^ a for some 
< a < 1 independent of n. Hence we get for any e, <5 > 



-l0g$(")( 9 „) > -«(*,*)-£• 



for n large enough. Therefore, the quantity (3 a (^, $) exists for any a G (0, 1) 
and coincides with — s(W,$), where we used again the HP-condition. In the 
case s(^>, $) = oo this assertion is a trivial consequence of j3 a fif, $) = — oo. 

4. Let {p n } be a sequence of projections p n £ A^ n ' with lim^oo ty( n > (p n ) — 
1 and lirun-joo i logTr(p„) = s(\P). Fix some e > 0. Let us write u n in- 
stead of u^ ( „) (s(^) + s( v E', $)) for short. From (fT4)) and Lemma[8]we infer that 
^^(unp n u n ) — > 1. Now u n p n u n is a positive operator being upper bounded 

n — >oo 

by its support projection swpp(u n p n u n ) which proves 1 jK n ) (supp(u ra p n u„)) — > 

n — 'oo 

1. From this we easily conclude that we may even substitute the e in the def- 
inition of u n = ui(„)(s($) + s($, $)) by a suitable sequence e n — » and still 
have \& (n) (supp(tt„p„u n )) -> 1. 

n — 'oo 

On the other hand, we have supp(u„p„u„) < u n as well as Tr(swpp(u n p n u n )) < 
Tr(p n ). Hence we get in the case s(>&, $) < oo 



- log$ (n) (supp(ii„p„u„)) 



n 

< i(-n(s(*) + s (*,$)- En )+logTr(p n )) — > -«(#,*), 



n 

resp. for s(\&, $) = oo 



- log (supp(u„p„u„)) 
< i(-n/e n + logTr(p„)) — > 



13 



This together with the fact we proved that no sequence of \& -typical projec- 
tions has a better lower limit of the separation rate than — s(\&, $) shows now 
that 

-log$ (n) (supp(u„p„u„)) — ► -s(*,$). 



n 

We proved all assertions of the proposition. 



3.2 Stein's Lemma implies Sanov's Theorem 

With the preliminaries given in the last subsection, it is now easy to complete 
the proof of Theorem [5j 
Proof. Let 

Smin := inf s(*) and s max := sup s(\l>), 

where s(^f) denotes the von Neumann entropy rate of the ergodic state \& € f2. 
Choose si, . . . , s m satisfying (for 77 := m _1 (s max - s min )) 

Smin = si < s 2 < ■ ■ ■ s m _i < s m = s max and s, - Sj_i = 77, i G {2, . . . , m}. 

Define s m+ i = s m + ?/• 

Let first s(f2, <1>) < +00. For i e {1, ...,m} we consider the collection of 
disjoint intervals 

J< := (si + a(Sl, $) - |, Si + s(ft, $) + | 

and 

im+l := (Smax + s(0, <f>) + -, 00). 

Moreover we define the following projections 

«n,< := X! spec A 
-i iogAe/i 

and 

Un,m+i : = X! spec A ($ (n) ), 

— i log A>s max +s(n,*)+r;/2 

where the summations extend over the eigenvalues of Additionally we 

consider universally typical projections p n ,i,i £ {l,...,m} (according to the 
Kaltchenko-Yang universality result [14]) to the levels s, + 77 (i.e. 

lim - logTr(p n! j) = a* + 77 (15) 

n — >oc Ti 

and 

Urn *W (p nii ) = l (16) 
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for each ergodic state with s('F) < Sj + rj). In addition, set p n , m +i '■= Pn,m- We 
may choose the sequence of these projections to be ascending, i.e. 

Pn.i < Pn,i+1, (17) 

since otherwise we may define 

i 

Pn,i • — \f Pn.j- 

The p n: i fulfil (|15[) and (JTHJ) as well, so we may work with these instead of p n ,i- 
Set r„ t i =supp(u„ i ip„ i iU„ i i) for i = 1,2, m + 1 and define p n by 

m+l 

Pn ■ ^ ^ 1*71,1- 
i=l 

(Observe that the r n ^ are mutually orthogonal.) 

For * € f2 let i € {1, . . . , m + 1} be the index fulfilling s('F) + s(*, $) e /, . 
This means that s(^) < Si + rj/2 < s io + 77. Consequently, by (fT6|) we obtain 

Iim n ^ 00 * (n) (p n) <o) = l. 

Further, by the relative AEP (Proposition^ lim^oo $W («n,i +Mn,i +i) = 
1, for i S {1, . . . , m}, and lim„^oo ^^(w^m+i) = 1 are satisfied. We add the 
projection it nj i +i for iq S {1, . . . , m} in order to cover the case where the mixed 
term is equal to the right end point of Ii a . We conclude from (TT7|) and Lemma 
E] that V {n Hr n , l0 +r„, io+1 ) -> 1, for »„ e {l,...,m}, and *( n )(r„, m+1 ) -> 1. 
Therefore 

lim ¥^(p n ) = 1. 

On the other hand we have for n sufficiently large by (|15p and by definition 
of 77 



m + l m+l 



\ e -n(a 4 +s(n,*)-»7/2) 



i=l i=l 
m+l 



< e n(si+2n) e -n(si+s(a,*)-J7/2) 



i=l 

e 



-7 I (s(0,$)-|r)~ ' og< " +1) ) 



e -n(»(H,$)-|m-'(8 M „-% h )- ''' e y > ) 

So, by choosing to sufficiently large, we get statement (U]). 

The case s(Q,$) = +00 easily follows by setting p n := w^ (n) (oo), see (fll 
By Proposition [5] the projection p n is asymptotically typical for all € 0, 
and we have $ (n) (Pn) <Tr(l^ ( „) )e~ n ^ 1 = e-n^-logTrOU)^ so again we get 
statement ([5]). 

Finally, the fact that a better separation exponent than — s(fi, $) is not 
achievable immediately follows from Proposition [9] ■ 
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4 *-mixing implies the HP-condition 

We start with a proposition extending the result in [13] to the case of only 
ergodic (instead of completely ergodic) \&. 

Theorem 10 Let $ € S*(A Z ). Then $ is an HP-state. 

Recall that S*(^4 Z ) denotes the set of stationary *-mixing states, see Defini- 
tion m 

Proof. 1. Let * g <S org (.4 z ). The relative entropy rate s(*, $) < +oo exists 
in view of [13j . Theorem 2.1, in connection with Remark 4.2, i&zd. (even if only 
* G 5 sta t(-4 z ) is assumed). 

2. Fix some Z £ N, another integer m (which in the sequel has to be chosen 
large enough) and represent the quasilocal C*-algebra A z as C*-algebra (A® 1 ® 
A®™) 1 ", i.e. partition the integers into blocks of length I + m, where each block 
consists of a starting part of length I and the remaining part of length m. Clearly, 
the entropy rate S(/ +m )(^E', <£>) with respect to this new partitioning exists, and 
we have S(;+ m )( v I', $) — (I + m)s(5',$). With respect to the canonical shift 
operator r;, m := r l+m acting in {A® 1 <g> A® m ) z , the state ^ is still stationary, 
but may fail to be ergodic. Anyway, it has a finite ergodic decomposition 



where some of the ergodic components may coincide, and all ^ \ r .i+m) have the 
same entropy rate S( i+m )(^( rii+m) ) = s (i+m )(*) = (l + m)s(^>), [T]. The ergodic 
components also have the same relative entropy rate S(z+ m )( v I / (r,i+m)7 := 
s^+m)^: = (' + ™)s( v I / , $)• Observe that this was shown in [3] for the case 
of stationary product states <&. However as the proof only makes use of the 
existence of the relative entropy rates sy+m^ty ( r j +rn j, $), which is guaranteed 
in our situation (see 1.), the monotonicity of the relative entropy and the affinity 
of the relative entropy rate with respect to its first argument the relation extends 
to ^-mixing reference states. 

Next, denote by I the trivial subalgebra of A generated by the unit element 
1.4 and consider the C* -subalgebra I® 1 ® A® m of A® 1 ® A® m . Then (I® 1 <g> 
A® m f is a C*-subalgebra of the quasi-local algebra {A® 1 ® _4® m ) z , and by [BJ, 
Theorem 4.3.17, the restrictions ^!t r ,l,m) of the ergodic components \&( rj ; +m ) to 
(I® 1 ® ^4®m)Z are ergodic, too. They are the ergodic components of ^a.m) := 
* t(i®'®.4® m ) z - 

We introduce the (-n i?n -)statioriary product state $(;+ m ) on (A® 1 <g> A® m ) z 
which is uniquely defined by its one-site marginals $ ['.4. | 8 , ®.A® Tn , and consider its 
restriction $(j m ) to the C*-algebra (I* 8 ' ® *4® m ) z , which is a (r iim -)stationary 
product state, too. 



I + m 



1 



Z+m-1 
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3. In the following we have to take into account whether s(^, $) is finite or 
infinite. Let us first treat the case s(^E', < l') < +00. 
Define for two states tp,ip on a, C*-algebra A, 

S co (tl),cp) := sup I ip(qj) ^* j : % projections with = j| 

(cf. E2]). 

Consider the relative entropies ^(j~tn) )> then we get by the sub- 

additivity of entropy and by the fact that the rates of the quantities S and 
coincide (Hiai and Petz [T2] ) 

From the definition of S^o and *-mixing we obtain now 

This is the same technique as used by Hiai and Petz in [13]. Again from the 
definition of S co we get 

b \^(r,l,m)' *(i,m) J 

< -loga+ lim ls c ^ { ( ki !i"f^ {k{l+m)) ), 

and from the relation Sc, < S (see [H]), which is a consequence of the mono- 
tonicity of the relative entropy, we arrive at 

^•iSfi).*^) (19) 

< - log a + lim 

< -loga + s (i+m) (* w+m) ,$) = -loga+ (Z + m)s(*,$). 

This upper bound may be utilized to derive an essential lower bound. For 
an arbitrarily chosen 77 > 0, define 

M, m , v :={r:0<r<l + m , -l-^gjjj, < .(*, *) - 
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The convexity of the relative entropy in its first argument together with (|19p 
yields 

l_ 5 (*<™),*<«>) = j^s^^t:?) ( 2 °) 



- m 



S (l +m )2 Z^ 5 ^(r-,i,m)' <P (m,0 J 

l + m y v y " / + m V 1 + 171 

Fixing I and letting m — ► oo, the expression 7x^5(\lK TO ', $( m )) tends to s(^, $). 
This immediately leads to the conclusion that 



m — >oo 



/ + m 

For each r S Af m , the relative entropy rate fulfills 
1 



for each Z, 77. (21) 



/ + m 



)>»(¥,*)-»?, (22) 



too, since $(i, m ) is a r;+ m -stationary product state. 

4. Hence we are in the situation treated in [3]. The main assertion of [3] 
is the quantum Stein's lemma saying that for any given e > it is possible to 
construct projections p r , n ,s € (I®' <g> ^4® m )®™ which are e-typical with respect 
to ^>( r: i. m ) (i.e. ^(ylri? \Pr,n,e) > 1 — e f°r large n) and maximally separat- 
ing: $a^r )n) (Pr,n, e ) < e-^^C'+m) (*(r,,,m) ,* ( i,rro)-<0 f or i arge n< Moreover, the 
quantum relative AEP (Theorem 2 in [3]) ensures, in particular, that if n is 
sufficiently large we have 

Tr Pr ^ £ < e ™( s «+™)(*(-<, m ))+ £ ) and 

g((i+m)n)/ x < -r l ( S(i + m) (* (r-! , m) )+ S(i + m) ($ (rii>m) , 
(£,m) V-P/ — 

for each minimal projection p < p r<niE . 

For our purpose we need a bit more information about the construction of 
these maximally separating projections. In the course of the proof of Theorem 
2 in [3] the projections p r , n ,s are constructed in the following way: 

a) A super-block length L is chosen, where the only requirement about L is 
that it is large enough to ensure some appropriate entropy approximation (any 
larger L will do, too). 

b) The projections p r ,nL,e are constructed as certain sub-projections of the 
projection 

p<"*>:= £ Bpec A ((S§2j>)**). (23) 

+ S (! + m) (* (r>i , m) ,$ (i>m) )-£ 
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c) The remaining projections p r , n L+k,e f° r I < k < L are constructed as 

Pr,nL,e ® (I A W+™.))® k ■ 

For given I and m we may choose one and the same super-block length L for 
the different r £ {0, 1, I + m — 1} and define our separating projections first 
for the multiples of L{1 + m) by 

QnL{l+m),e '■= \/ Pr,nL,e- 



In view of (f2"51) we get, using 

Qn H l +m ),e < E SpeC^j^)®^) (24) 

-^L lo g A > ™ in S (I + m )(*(r,!,m)) 
reA !,m,e 

+ mj 11 s (!+m)(*(r,i,m):*(I,m))-e 
reA l,m,s: 

< E spec A (($^)^). 

-^L l0 S A > ™ in S (I + m )(*(r,!,m)) 
*" eA i,»n,e 

+ (i+m)s(*,$)-(i+m+l)e 

Next, observe that by the subadditivity of the entropy we have for each r 

S(i+m)($(r,i,m)) = Hm T #(*(r ^mT^ ) = , um T^C^W+tn) t(I®'®.A®*™)®<0 

fc — * OO ft, fc — ^ OO nJ 

1 

S (i+m)(*(r,i+m)) ~ 'Trl^ = (I + rn)s(*) - ZTrl.4, 



> lim ^(S(^ fc( /+ m , )) )- fcZTrl^) 



and hence for sufficiently large m we may continue the chain of inequalities (|24|) : 

- log A>TiL(i+m)(s(*)+s('I',*)-3e) 

From this we derive the following upper bound, being valid for m large 
enough (using the Araki-Lieb inequality in the fifth line) 



-rnL(l-\-m) / \ 
®(l, m ) \1nL(l+m),e) 

< -ni(«+m)(s(*)+s(*,*)-3e:)r TV 

< e -niG+m)( s (*)+ s (*,*)-3s) e ni( S ( i+m) ($ (r , !im) )+ E ) 

< e -ni(i+m)(s(*)+s(*,*)-3e:)^ + m ) e «L(i+m)(s(*)+2 £ ) 

< e -nL(i+m)0(*,$)-6e) 
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From ^-mixing we get now the desired separation order 

$(" L V+™))(q nHl+mhe ) < e -nL(l+ m )( s (*,$)-6s) a -nL 
= e -nL(;+m)(s(*,*) + i2Ea-6e) 
< e -nL(i+ro)(s(#,*)-7e) 

(for m large enough). 

On the other hand, \t-typicality is guaranteed by 

^ nL{l+m)) (QnL(l + m),e) = *^ + ™» I \J Pr ,nL,s 

i+m-1 / 

l + m ^ V.«+m) V P 

I ^(nL(i+m)) f ^ 

*(r'.l+m) \Pr',nL,e) 



'r,nL,e 



> 



l + m ^ lr>,i+m) 

r'€Af 



> T^ZZ E 



. , m 

the last inequality being valid for large n. We may continue 

^nL { l + m) ){ ^ > (1 _ £) _ 1 £ (1 _ e) 

v " l + m 

> (l - E ) - # Al ' m > £ > 1 - 2e 

for 77i large enough (by (|2"Tj) h 

Now (in the usual way) we may interpolate the q n L(i+m),e m order to define 
the projections g„ jE also for n € N which are not multiples of L(Z + ttt,). We 
derived the existence of a sequence of projections being asymptotically e-typical 
for iff and fulfilling 

<S> {n \q n ,e) < e- n ^>^-^ 
for large n. This proves that, for any a S (0,1) the separation exponent fulfils 
ft* (*,$)< $) for finite s(#, $) . 

5. Now assume s( v E',$) = +oo. Observe that in that case the estimates 
in (|20|) and hence (|2~Tj) are not valid. But ([21]) becomes true if we replace the 
definition of Ai >m ^ v most appropriately by 

:={r:0<r<l + m, j^S®^,*™) < 

In fact, choose M large enough to ensure 5($W,$W) > ry-iM (we include 
the case S(Jff( M \ $( M )) = +oo). Now we have the ergodic decomposition 

^ Af-l ^ M-l 

* = M E *<r,M) = ]jj E ° r " r ( 25 ) 

r=0 r=0 
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due to p. The states ^( r ,M) = *(o,m) ° T ~ r are T M -ergodic. In view of the 
(joint) convexity of the relative entropy we conclude that at least one of the r 
fulfils , $( M )) > ri~ 1 M. We may assume without any loss of generality 

that this is true for r = 0, i.e. S{^>^ , $W) > jj- 1 M. The r M -ergodic state 

^(o,M) again has an ergodic decomposition with respect to r 2JU 

*(0,M) = i(*' + *'oT- M ) 

and, applying once again the convexity argument we find that we may assume 
S(y/'( M ), $W) > -q^M. is r 2M -ergodic, and we obtain from an ergodic 
decomposition of ^ into r 2M -ergodic states 

1 2A/-1 

2M ^ 

hence we may assume without loss of generality that &(o,2M) — So we have 

for M large enough. This yields 

S((*(p,2M) ° T- r ) ({r,r+1 -- r+M_1}) , $ (M) ) > T^f for each r 
in view of the definition of r, i.e. (using the stationarity of $) 

5((* (p ,^)«^ 1 --"^- 1 »,* H ^ l, -" , ^" 1>> ) > r'M. 
In view of the monotonicity of the relative entropy we get now for r > I 

S(($(r,l,2M-l)) i2M \$ \l»i 9 A*<») > r X M, 

So again (|2ip is fulfilled for M sufficiently large. We conclude that asymptot- 
ically for the overwhelming part of the r in {0, 1, / + m — 1} the expression 

iq^S(i+m)(*(r,2,m) 5 $(J,m)) is arbitrary large (i.e. > i^ -1 or even infinite) for 
large m. Now we may proceed essentially as in 4., employing the results of 
[3]. We find projections p r . n L,ri, separately for each r in Af m _, which distin- 
guish between ^(r,i,m) and $(i,m) exponentially well at a rate at least^?? -1 , and 
we may join these projections to find ^P-typical projections q n L(i+m),ri- This is 
possible due to the properties a) and b) above, where now b) is modified to 

b') The projections p r ,nL,-q are constructed as certain sub-projections of the 
projection 

p^:= Yl ^((^rn (26) 

log A< — nL\r}~ 1 

But we have to take into account that [3J only treats the case of finite relative 
entropy rate; hence those r, for which ^C^^m)' ^(i"m) ) = are still not 

covered. For those r, simply choose 
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Obviously, this projection fulfils ($[^ ) )®™ L (spec (($[^ ) )®"- L )) = 0, and it 

is a sub-projection of p( nL \ We still have to show that speco(( ( E>^^' ) )® TlL ) is 

asymptotically typical for each ^( r ^ ym ) with S^^Tm)' ^(nTl?) = +°°- ^ n ^ act ' 
represent the density operator of $^ m ^as 

K 

J=l 

where the iUj are mutually orthogonal minimal projectors in _4® m fulfilling 
5^ - w j = l_4®m (and the Aj are the eigen- values of D^ m ) including 0). Let 

vj := 1_4®; ®Wj. Observe that due to ^C^^im) > ^(!~m) ) = +°° there is at least 
one j with Aj = but ^^tiff (^l) > ®' ^ ow we nave 

(ii,...,j„)eiv n Vfe=i / 

where iV„ := {(ji, j„) : nLi A J fc = °l = (( N iT) c - Denote by B the abelian 
sub-algebra of I®' ® _4®«i generated by the set {vj}. Then the quasi-local 
algebra B z is an abelian sub-algebra of (I® 1 ® ^4®«i)Z an( j ^he restriction P 
OI ^(r,i,m) to this sub-algebra is a classical ergodic process with K symbols 
(Gelfand isomorphism and Riesz representation theorem). This process fulfils 
P (1) ({j}) > 0- We may continue the left-hand side in ([27)) as follows 

*SS^ )) (BPec ((*SB , ) 8B )) = pin){Nn) 

= l-p( n \(Nf) n ) 

> i-p(")(({7} c )»). 

Now P (ll) (({7} c )") is the probability of all n-sequences of symbols where the 
symbol j does not appear at all. This tends to zero, since by the individual 
ergodic theorem the a.s. asymptotic frequency of the symbol j is P^'({j}) — 
\VjJ > by assumption. Hence the conclusions of part 4. are valid in 
the case of infinite relative entropy, too. ■ 

5 The stationary case 

So far we formulated the Theorems [5] and [7] for sets of ergodic states ^> resp. 
processes P to be optimally separated from a reference state or process. These 
results can be easily extended to the general stationary situation. 

Any stationary state \& € S s tat(*4 Z ) can be represented as a mixture (ergodic 
decomposition) 

# = / Z-yy(dE) 
JS Brs (A*) 
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of ergodic states (iS s tat(.<4 Z ) is a Choquet simplex, <S erg (.4 z ) is the corresponding 
set of extremal points, 7^ is a probability measure on the measurable space 
[iS erg („4 z ), 25(T_4z)], with T A z denoting the weak-*-topology and Q5(T A z) the 
corresponding Borel cr-field, cf. |21|). The measure 7^ is unique. 

Now let $ G S(A Z ) be a state and 6 C S sta t(A z ) with the property that 
for any $£0 the relative entropy rate s(S, <£) exists for 7$-almost all S. We 
define the quantity 

s(>, $) := essinf 7#(dS) s(E, $), 
and for Q C the quantity 

s(Q,$) := inf s(*,$). 

Theorem 11 Let $ be a state on A z and 9 C S s t a t(A z ) such that for each 
\I/ G O and ^-almost all H i/ie pair (S, $) satisfies the HP-condition. 

Then the quantity s(^, $) < +00 exists for each ^ G O, and to each subset 
fi C G and any 7/ > there exists a sequence {p n }n£N of projections p n G A* n > 
with 

lim * (n) (P«) = 1, /or all * G 17 (28) 

and 

limsup- log $W (p n ) < -s(fi, $) + 77. (29) 

n — >oo Tl 

if s(fl, $) < 00, otherwise if s(Q, $) = 00 

limsupi log$W(p n ) < -~ (30) 

n — >oo n 77 

Moreover, for each sequence of projections {p n } fulfilling 128\) we have 

liminf- log (p n ) > -s(Q, $) . (31) 

n— >oo 77, 

Hence — s(Q, $) is i/ie lower limit of all achievable separation exponents. 

Remark 12 //<!> zs stationary and, moreover, ^-mixing, the assumption of the 
Theorem[TT\ is fulfilled with Q — S s tat{A z ), according to section^ 

Proof. Let O := {S G 6> Grg (.A z ) : (2,$) satisfies the HP-condition and 
s(S, $) > s(f2, $)}. The set (l is weak-*-measurable since it can be represented 
by a countable application of unions and intersections to local sets, defined via 
the measurable functions S(-, $w) and /3 e , n (-, $). 

Let p„ be chosen as in Theorem [5J with fi there specified as il. Then (|29p 
or ([30)) are trivially fulfilled. For any f e8we obtain by assumption 

* (n) (P«) = / S< n >(p n ) 7 *(dS) (32) 

= /sW(p n )7»(dS). 
in 
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Now for each S e n the expression E^(p n ) e [0, 1] tends to 1 by the choice 
of the projections p n . Hence Lebesgue's theorem on dominated convergence 
guarantees (f2"g)) . 

On the other hand, for each sequence of projections {p n } fulfilling (|2"5|) the 
identity (pi!?)) (with p n instead of p n ) proves that, for each \& e n, E^ n \p n ) tends 
to 1 in 7^ -probability as n — > oo. By the definition of s(f2, $) to any 77 > we 
may choose $ in such a way that s(\&, $) < s(f2, $) + n. We show that 

liminf-log$ (n) (p„) > -s(#,$), 

n — >oo 71 

which implies (131|) since 77 can be chosen arbitrarily small. In fact, assume the 
existence of a sub-sequence n' such that 

lim i- log < -*(¥, $) - <5, 5 > 0. (33) 

n' n 

Along that sub-sequence there is still convergence in 7,1,-probability of S^™ ) (j> n i ) 
to 1. Since convergence in probability implies almost sure convergence of some 
sub-sequence, we find another sub-sequence n" of n' with lim n » \p n " ) — 1 
holding 7$-almost surely. Hence, in view of the definition of s(^E', $) there 
is some So € <5> C rg(-4 Z ) such that (So^) fulfils the HP-condition, 3(20,$) < 
s(#, $) + S, but 

lim„» S ™ \p n ") — I- Now Theorem[5l applied to the case n = {So} implies 
liminf— log > -s(\P, $) - <5, 

n" n 

which contradicts ■ 

The classical case immediately follows (with 7p denoting the probability 
measure occurring in the ergodic decomposition of a stationary process P and 
h(P, Q) :=essmi lP ( dW ) h(W, Q), supposed that h(W, Q) exists 7p-almost surely): 

Theorem 13 Let Q e "P(A Z ) be a process andO C V s tat(A ). Assume that for 
each P e O and jp-almost all W £ 'Pstat(A 1 ') the relative entropy rate h(W, Q) 
exists and E (W, Q) < -h(W, Q) for all e £ (0, 1). 

Then the quantity h(P, Q) < +00 exists for all P £ 0, and to each subset 
Q C and any r\ > there exists a sequence {M ra } ra gN of subsets M n C A n 
with 

lim P {n) (M n ) = 1, for all Pen (34) 

n— >oc 

and 

limsup- \ogQ {n) {M n ) < -h(n, Q) +77 

n — >oo 71 

if h(n, Q) < 00, otherwise if h(n, Q) = 00 

limsup- log Q (n) {M n ) < --. 

n— >oo ft Tf 
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Moreover, for each sequence of subsets {M n } fulfilling \34\ ) we have 
liminf i log Q (n) (M„) > -h(fl,Q). 

n — >oc fi 

Hence —h(£l, Q) is the lower limit of all achievable separation exponents. 

Remark 14 If Q is stationary and, moreover, * -mixing, the assumption of the 
Theorem is fulfilled with = T J s tat{A 7 '), according to section [|J 

6 The quantum Shannon-McMillan theorem for 
stationary states and other corollaries 

As announced in the introduction, several earlier results on typical subspaces 
resp. subsets are contained in Theorem [Tl] in a version extended to the station- 
ary case. We emphasize that the initial versions of quantum Shannon- McMillan 
theorem, Kaltchenko-Yang universality and the quantum Stein's lemma were 
important ingredients in our proof. Also, it should be mentioned that it is not 
difficult to prove the stationary case of the quantum Shannon-McMillan theo- 
rem directly from the Kaltchenko-Yang result, without using quantum Sanov's 
theorem. 

Corollary 15 (Quantum Shannon- McMillan theorem for stationary states) 
Let ^> 6 S s tat{^) be a stationary state and ^> = J s ^^H^^gE) be its 

ergodic decomposition. Then there exists a sequence {p n } of projections in A^ n ', 
respectively such that 

• linin^oo *(™)(p„) = 1 (typicality) 

• linin^oo iTrp n =esssup 7lt ( ( z2)s(S) := s(&) (maximal ergodic entropy rate). 
For any sequence p n with linin^^ $( n ) (p n ) = f we have 

liminf— Trp n > sfSfO (optimality). 

n— >oo fi 

Remark 16 We emphasize that the AEP does not hold in the stationary case. 

Also, observe that the relevant notion in the stationary case is not the von 
Neumann entropy rate s(^) of the state '5 being the average of the entropy rates 
of the ergodic components of ^ , but their essential supremum ~s(fy) . 

Proof. Let <J> be the tracial state in S(A 1 '). It is *-mixing (even iid). Apply 
Theorem [TT] with SI — {"P}. This yields a sequence {p 7 ^} of ^-typical projections 
with 

s(*) < liminf- Trp^ < limsup-Trp^ < s(^) + r; 

n->oc n n^,oo n 

for any 77 > 0. Now the assertion of the Theorem easily follows, since fHs a 
finite set. ■ 

The next corollary extends the universality result of [M] to stationary states: 
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Corollary 17 (Kaltchenko-Yang universality theorem for stationary states) 

Let fl s := £ Sstati.A 1 ') ■ s(^) < s }- Then there exists a sequence {p n } of 
projections in A^ n ' , respectively such that 

• linin^oo ty( n \p n ) = 1 for each £ fl s (typicality) 

• linin^oo iTrp n — s (maximal ergodic entropy rate). 

For any sequence p n with linin^oo ^ n '(p n ) = 1, \I r G O s , we have 

liminf— Trp„ > s (optimality). 
n—>oo n 

Proof. Let $ again be the tracial state in S(A Z ). Apply Theorem QT] in a 
similar way as in the proof of Corollary ITSl to the sets £l s -ri, V > 0. ■ 

Remark 18 Observe that the condition s(^) < s defining tt s cannot be replaced 
by s(*) < s. 

Finally, quantum Stein's lemma [3] is extended to the case where the null 
hypothesis state VP is only assumed stationary, the reference state $ fulfills 
the HP-condition with respect to almost all ergodic components of ^ (and the 
relative entropy rate s(^, may be infinite): 

Corollary 19 (Stein's lemma for stationary states) 

Let $ S S(A % ) and * g S sta t{A z ) such that for ^-almost all 3 the HP- 
condition is fulfilled for (3, $). Then there exists a sequence {p n } of projections 
with 

• limn^oo *(")(p„) = 1 (typicality) 

• lim n _ >00 i log$(")(p n ) = — s($, $) (achievability of the separation expo- 
nent -s(^, $) j. 

For any sequence p n with linin^oo ^^ n \p n ) =1 we have 
liminf- log & n Hp n ) > — (optimality). 

n^oo n ' 

Remark 20 Note that the relative AEP does not hold in Stein's lemma in the 
stationary case. 

Again, the relevant quantity in the stationary situation is not the average 
relative entropy rate s( v I / , < i>), but the essential infimum s(ty , $) . 

Proof. With £1 consisting of a single state \P only, we may proceed in the 
same way as in the proof of Corollary 1151 ■ 
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7 Conclusions 



The paper is devoted to a generalization of Sanov's theorem from the iid clas- 
sical situation to the correlated case and, moreover, to the quantum setting. In 
the present form, the main result comprises and extends several earlier asser- 
tions including the (quantum and classical) Shannon-McMillan theorem, Stein's 
Lemma (with relative AEP), Kaltchenko and Yang's universality and, of course, 
a version of Sanov's theorem itself. It is a continuation of [2J, where the un- 
corrected case is considered. It has to be pointed out again (see [2j), that any 
attempt to formulate a quantisized version of Sanov's result has to face the 
problem that the very notion of a trajectory and its empirical distribution is 
problematic in quantum mechanics. Sanov's classical theorem claims that for 
an iid process with marginal Q the probability to produce a trajectory with the 
empirical distribution belonging to some set f2 of probability measures is (in 
general) exponentially small. The corresponding rate is specified as the mini- 
mal relative entropy between Q and the distributions in Q. In the interesting 
case the measure Q is of course not an element of f2 or its topological closure. 
So it is a large deviation result: the typical behaviour of Q-trajectories is to 
have an empirical distribution close to Q. Whatever one tries to adopt as a 
quantum substitute for the empirical distribution, the natural choice in the case 
of a tensor product of vector states v (8 v ® ... ® v should be v itself. This 
leads into the problem that for a reference vector state w® n the probability of 
measuring an 'empirical state' v is at least TrP w ® n P v ® n = \ (w\v)\ 2n , while the 
relative entropy of v wrt w is infinite, which would imply a super-exponential 
decay; for a more detailed exposition see [2j. In this situation it proves useful 
to look at Sanov's theorem as an assertion about the likelihood of observing the 
classical iid process given by Q far from its original support in the vicinity of the 
supports of other iid processes. The most natural choice for 'typical support' 
in the classical framework is the set of trajectories with empirical distribution 
close to the given probability distribution, since according to the individual er- 
godic theorem the empirical distribution tends to Q with probability one. So 
Sanov's theorem in its original form says that the probability of observing the 
trajectory in the typical support of other distributions, concretely specified by 
means of the corresponding empirical distributions, vanishes at a rate given by 
the minimum relative entropy. It is of course completely legitimate to insist on 
the point of view, that a quantum Sanov's theorem should be about empirical 
distributions, too (see |17j . Remark 4, see also an attempt to formulate a quan- 
tum (iid) Sanov theorem made in Segre's Ph.D. thesis [23 (2004), Conjecture 
7.3.1.). But, as explained, then one loses the relation to the established form of 
quantum relative entropy (Umegaki's relative entropy). We chose to 'sacrifice' 
empirical distributions in our approach but nonetheless calling it a version of 
Sanov's theorem: in the classical case the relative entropy is not only the rate of 
separation when empirical distributions as specifying typical sets are considered. 
It has a clear operational meaning as the optimal separation rate, whatever one 
considers as typical support in the sense that the probability goes to one. This 
perception of Sanov's theorem, closely connected with the statistical hypothesis 
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testing aspect, appears to be natural. It allows useful generalizations to the 
correlated and quantum cases. 
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